2 research outputs found

    Deep learning for the early detection of harmful algal blooms and improving water quality monitoring

    Get PDF
    Climate change will affect how water sources are managed and monitored. The frequency of algal blooms will increase with climate change as it presents favourable conditions for the reproduction of phytoplankton. During monitoring, possible sensory failures in monitoring systems result in partially filled data which may affect critical systems. Therefore, imputation becomes necessary to decrease error and increase data quality. This work investigates two issues in water quality data analysis: improving data quality and anomaly detection. It consists of three main topics: data imputation, early algal bloom detection using in-situ data and early algal bloom detection using multiple modalities.The data imputation problem is addressed by experimenting with various methods with a water quality dataset that includes four locations around the North Sea and the Irish Sea with different characteristics and high miss rates, testing model generalisability. A novel neural network architecture with self-attention is proposed in which imputation is done in a single pass, reducing execution time. The self-attention components increase the interpretability of the imputation process at each stage of the network, providing knowledge to domain experts.After data curation, algal activity is predicted using transformer networks, between 1 to 7 days ahead, and the importance of the input with regard to the output of the prediction model is explained using SHAP, aiming to explain model behaviour to domain experts which is overlooked in previous approaches. The prediction model improves bloom detection performance by 5% on average and the explanation summarizes the complex structure of the model to input-output relationships. Performance improvements on the initial unimodal bloom detection model are made by incorporating multiple modalities into the detection process which were only used for validation purposes previously. The problem of missing data is also tackled by using coordinated representations, replacing low quality in-situ data with satellite data and vice versa, instead of imputation which may result in biased results

    Towards the Automatic Extraction of Plant Traits from Textual Descriptions

    Get PDF
    Many ecological restoration programmes are informed by evidence coming from empirical research. Specifically, such programmes analyse species traits in order to differentiate species that are suitable for restoration from unsuitable ones. Indeed, understanding plant traits (and their relationships with each other) informs research into vegetation modelling and environmental change prediction, which in turn help in answering many ecological questions. In 2006, the Center for Tropical Forest Science (CTFS) formulated recommendations in support of their research programme, the foremost of which is the creation of trait databases by building upon published information catalogued by existing herbaria. In this work, we aim to enrich World Flora Online (WFO), a web-based inventory of known plant species, by integrating trait information contained in data sets coming from botanical institutions all over the world. This poses a few challenges, as trait information tends to be buried within verbose textual descriptions and do not conform with conventions of writing. Specifically, they typically do not come in the form of full sentences and look like long-winded enumerations of various types of plant attributes or characteristics. Such descriptions are difficult to search and understand unless decomposed into meaningful units. In order to decompose textual descriptions of plant species into spans pertaining to specific types of attributes, we have developed a machine learning-based approach to automatic text segmentation. Casting the problem as a sequence labelling task, we have investigated a number of probabilistic classifiers including conditional random fields (CRFs), hidden Markov models (HMMs) and naïve Bayes (NB). To train our models, we utilised data contributed by the South African National Biodiversity Institute (SANBI) which contain traits labelled as one of the following trait categories: morphology, habitat and distribution. To help the models discriminate between these categories, we designed features capturing word characteristics (e.g., n-grams at the character and word level), context (i.e., surrounding words within a predefined window), as well as domain knowledge (i.e., words that match terms in plant-related ontologies). In this way, we can automatically elucidate exactly which parts of the original descriptions pertain to plant traits such as morphology, habitat or distribution. By applying the resulting models on textual descriptions coming from several botanical institutes, we can facilitate the automatic population of WFO with plant traits for a number of species
    corecore